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Abstract 

We consider the problem of sensor selection for event detection in wireless sensor networks (WSNs). 
We want to choose a subset of p out of n sensors that yields the best detection performance. As the 
sensor selection optimality criteria, we propose the Kullback-Leibler and Chernoff distances between the 
distributions of the selected measurements under the two hypothesis. We formulate the maxmin robust 
sensor selection problem to cope with the uncertainties in distribution means. We prove that the sensor 
selection problem is NP hard, for both Kullback-Leibler and Chernoff criteria. To (sub)optimally solve 
the sensor selection problem, we propose an algorithm of affordable complexity. Extensive numerical 
simulations on moderate size problem instances (when the optimum by exhaustive search is feasible to 
compute) demonstrate the algorithm's near optimality in a very large portion of problem instances. For 
larger problems, extensive simulations demonstrate that our algorithm outperforms random searches, once 
an upper bound on computational time is set. We corroborate numerically the validity of the Kullback- 
Leibler and Chernoff sensor selection criteria, by showing that they lead to sensor selections nearly 
optimal both in the Neyman-Pearson and Bayes sense. 

Keywords: sensor selection, event detection, robust optimization, Chernoff distance, Kullback-Leibler 
distance 
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I. Introduction 

Wireless sensor networks (WSNs) operate with limited power and communication resources. When 
observing phenomena with WSNs, a major challenge is to balance the tradeoff between the quality 
and the cost of operation. A fundamental problem of this kind, the sensor selection problem, is how 
to optimally select a limited subset of sensors (hence limiting the operation cost) that gives the most 
valuable information about the observed phenomena. 

Problem statement. This paper studies the sensor selection problem for event detection in WSNs. Nature 
can be in one of two states: H\ (event occurring, e.g., target present) and Hq (event not occurring, 
e.g., target absent). A WSN, composed of n sensors, instruments the nature. The distribution of the 
n-dimensional measurement vector is assumed Gaussian under the two hypothesis, with different means 
mi and covariances Si, i = 0, 1. We assume that, due to inherent WSN constraints, such as power, only 
p (out of n) sensors can sense and transmit their readings to a fusion node; based on the received p 
readings, the fusion node performs detection, i.e., decides which of the two hypothesis is true. We ask 
the following question: Which p sensors should be chosen to achieve the best detection performance? 

Each possible p-sensor selection induces a p dimensional Gaussian distribution tt{ (of selected sensors) 
under hypothesis Hi, i = 0, 1. Intuitively, p-sensor selection that yields more distant distributions 7Ti and 
7To leads to better detection. Hence, we propose, as sensor selection optimality criteria: 1) the Kullback- 
Leibler (KL) distance; and 2) the Chernoff (C) distance between tt\ and ttq. In practice, the distribution 
parameters (mj, Si) are estimated from training data, and may not be known exactly, but only within an 
uncertainty region. We thus formulate the robust maxmin sensor selection problem of maximizing the 
KL (or C) distance between the selected distributions m and ttq, for the worst case of parameter drifts. 
In this paper, we address the case when only the means of the two distributions are uncertain, relegating 
the general case for future work. 

Contributions. The problem of evaluating the best p-sensor subset is combinatorial; checking over all 
(p) possible combinations becomes infeasible when n and p are sufficiently large. We indeed prove that 
the KL and C sensor selection problems are NP hard; hence, it is unlikely to find an efficient algorithm 
that solves large instances of these problems. To (suboptimally in general) solve the sensor selection 
problems, we propose a computationally affordable algorithm. For example, to select 10 out of 100 
sensors, our algorithm takes only few seconds on a current generation personal computer. There is no 
theoretical guarantee that our algorithm produces an optimal or near optimal solution; however, extensive 
numerical experiments demonstrate that our algorithm produces an optimal (or near optimal) solution in 
most cases. 
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The KL and C distances are only heuristic optimality measures for the detection performance; the exact 
(yet in our case intractable) criterion is the probability of detection error (either in Neyman-Pearson or 
Bayes sense.) However, interestingly enough, we show by simulations that optimizing the KL and C 
distances yields to sensor selections that are very close to the optimal probability of error (either in 
Bayes or Neyman-Pearson sense). Moreover, they are often indeed optimal. 

Selecting p out of n sensors is equivalent to finding the nxp selection matrix with 0/1 entries that maps 
the n-dimensional vector x of all measurements to the p-dimensional vector y of selected measurements. 
Our methodology for solving the combinatorial sensor selection problems relies on enlarging the search 
space from the 0/1 selection matrices to the set of Stiefel matrices (the matrices with orthonormal 
columns). Then, after solving for the Stiefel matrix, we project it back to the set of 0/1 selection matrices. 
The relaxed Stiefel problem corresponds to finding the linear map R n — > W which maximizes the (worst- 
case) KL (C) distance between the projected distributions in the lower p-dimensional space. To our best 
knowledge, existing work on this topic either does not solve the problem in full generality (i.e., unequal 
means and covariance matrices) or does not guarantee global optimality of their solutions (see [1], [2] 
for problems involving Chernoff distance and the closely related J-divergence). A major contribution of 
this paper is that we solve this nonconvex problem globally for the case p = 1, and in full generality, 
by reducing it to scalar (ID) problem over a compact interval. We tackle the generic case p > 1 via 
an incremental, greedy approach, based on the ID case, which provides near optimal result with small 
computational cost. 

This paper is related to our prior work [3], [4], which also considers sensor selection for event detection, 
but only based on the KL distance. With respect to KL distance, this paper provides a new heuristic 
with reduced complexity; more importantly, this paper studies the problem with respect to the Chernoff 
distance, which we did not consider in [3], [4]. With respect to [3], [4], this paper also contributes by 
validating the KL and C distances as good optimality criteria by showing their near optimality in the error 
probability sense, and by establishing NP hardness of the corresponding sensor selection optimization 
problems. 

Finally, we would like to note that the KL-based and C-based sensor selection problems could be, 
in principle, globally solved (inefficiently) by, e.g., branch and bound methods [5], [6]. However, the 
computational time of such methods is often very long, even for modest values of n and p. We discuss 
in more detail the challenges to solve the sensor selection problems that we address at the end of 
subsection II-B. 

Review of existing work on sensor selection. Sensor selection problems have been extensively studied in 
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different contexts, including target tracking (e.g., [7]), target localization (e.g., [8]), robotics (e.g., [9]), and 
wireless sensor networks (e.g. [10], [11]). Generally, all their work aims to optimize certain measure of 
performance of the system (e.g., utility [12], information theoretic measure [7], [13], estimation error [14]) 
subject to energy constraints (e.g., limited number of sensors to be selected [14]). Sensor selection can be 
geometric-based (e.g., [15]), or information-theoretic based (e.g., [7], [13]). Our work belongs to the class 
of sensor selection problems for inference tasks, i.e., sensor selection for estimation and detection, and it 
adopts the information-theoretic point of view. Reference [14] considers the problem of sensor selection 
for parameter estimation in WSNs, proposing to select the subset of p (out of n) sensors that minimize 
the determinant of the estimator covariance matrix. Reference [16] proposes distributed algorithms to 
(suboptimally) solve the sensor selection problem formulated in [14]. Reference [17] addresses the 
problem of selecting the maximal number of reliable sensors for estimation. Reference [ 1 8] shows, through 
the optimal experiment design framework ([19]) and using convex analysis, that optimal estimation is 
achievable by using only a relatively small number of sensors. 

Paper organization. Section II introduces the model and formulates the sensor selection optimization 
problems. Section III details the algorithms for solving the robust sensor selection problems, in the 
presence of uncertainties in the means of the distributions. Section IV considers the special, yet important 
case, when there are no uncertainties in the distribution parameters. Section V demonstrates numerically 
that the KL and the C distances are appropriate metrics for sensor selection. Section VI shows near 
optimal performance of the proposed algorithms. Finally, Section VII concludes the paper. 

II. Problem model: Formulation of the sensor selection optimization problem 
A. Problem model 

We assume that nature can be in one of two states: H\ -event occurring, and Ho-event not occurring. 
Let x G W l denote the vector that collects all sensor measurements (one measurement per sensor). We 
assume that x is Gaussian under the hypothesis H\ (respectively, Hq,) with generic means and covariances 
(mi, Si) (respectively, (mo, So)), i.e., 

H : x ~ Af(m , S ) 
Hi : x ~ jV(mi, Si) 

where jV(/x, S) denotes Gaussian distribution with the mean vector fi and the covariance matrix X. The 
Gaussian assumption on x is standard and can be, in many applications, justified, e.g, by central limit 
theorem arguments, see, e.g., [20], [21]. Noise correlation (i.e., non diagonal covariance matrices) is 
important to take into account in dense deployments of WSNs. We note that our formulation allows for 
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different covariances under two hypothesis (S± ^ So); in many applications, e.g., power-based detection of 
primary users for cognitive radios (see [21] for details), accounting for different covariances is essential. 1 
Sensor selection. Sensors transmit their measurements to a fusion node, which conducts the hypothesis 
test.Due to power constraints, only p sensors, p < n, perform their measurements and transmit them 
to the fusion node. We address the problem of selecting the p sensors that guarantee the best detection 
performance. Mathematically, selecting p out of n sensors can be represented by a linear map IR n — > 
R p , x i — y y = E x, where E G R raxp is a rank-p matrix that has exactly one unit entry per column, 
corresponding to a chosen sensor, and the other entries in columns being zero. The columns of E are 
orthonormal, i.e., E J E = I p , where I p denotes the p x p identity matrix. We refer to matrix E as the 
sensor selection matrix. 

Hypothesis test induced by E. Conditioned on Hi, % = 0, 1, y is a linear transformation of a Gaussian 
vector x. Thus, y, under Hi, has the following distributions: 

H : y~M(E T mo,E T S E) (1) 
H x : y^M{E T m 1 ,E T S 1 E). 

The fusion node performs the following log-likelihood ratio (LLR) decision test: 

log — — — ^ 7, (2) 

fo(y;E) Ho 

where ;E), i = 0, 1, is the density of N (E T mi, E T SiE) and 7 £ 1 is the test threshold, [2]. 
B. Formulation of the sensor selection optimization problem 

Sensor selection optimality criteria. Detection performance is, as noted above, generally quantified 
by the error probabilities, PpA, Pd and (P e ). However, in the problem that we consider, none of the 
probabilities above admit closed form expression, and their minimization with respect to sensor selection 
is a very hard problem. As sensor selection optimality criteria, we propose the Kullback-Leibler (KL) 
and the Chernoff (C) distance between the tested distributions. Given two distributions, with densities /1 
and fo, KL and C distances measure dissimilarity between /1 and /o, and they are defined as follows: 



£>kl(/i||/o) == / log£^/iOr)dx 
J fo{x) 

£>c(/i,/o) := max -log f ff(x)f^ s (x)d. 
se[o,i] J 



'Generally, our model applies also if x contains samples from multiple sensors over multiple sample times, that is, several 
entries in x correspond to same sensor. In such scenario vector x accounts both for temporal and spatial correlation between 
observations. To focus the presentation, however, we assume that different entries of x correspond to different sensors. 
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Thus, our goal is to find a sensor selection that yields the maximal KL or C distance between the projected 
Gaussian distributions, Af(E r m 1 , E T SiE) and Af(E T m , E T S E) (see (1)). 

We are motivated to choose KL and C distances, as sensor selection optimality criteria, by two 
fundamental results from detection theory: Chernoff-Stein lemma and Chernoff lemma. Chernoff-Stein 
lemma (resp. Chernoff lemma) states that, when the number of independent identically distributed (i.i.d.) 
observations grows, the rate of exponential decay of probability of false alarm, Pfa, (resp. probability 
of error, P c ,) of the Neyman-Pearson optimal (resp. Bayes optimal) test equals the KL (resp. C) between 
the two distributions. Thus, for large number of samples, more distant distributions (in either KL or C 
sense) lead to better detection performance. Probabilistic distance measures have been often used in the 
literature as heuristics for detection problems (see, e.g., [1], [22] for applications in linear dimensionality 
reduction) and have shown excellent results, even when the number of samples is very small or even 
equal to one, see [22]. In section V, we demonstrate by numerical tests that the KL and C distances 
are indeed excellent criteria for sensor selection, exhibiting near optimal performance in the probability 
of error sense. Section V shows that, generally, C distance has an advantage in the regimes of high 
probability of detection (Pd) (upper part of the receiver operating characteristics (ROC) curve), while 
KL has an edge in the regimes of low PpA (lower left part of the ROC curve). 

Robustness against uncertainty in distribution means. We consider the case when the true distribution 
parameters (rrii, Si), i = 0, 1 are not exactly known at the fusion node detector (2). Fusion node has 
their estimates, (jhi,S^j, which can be obtained, e.g., in the network training phase. Thus, there is a 
mismatch between the distribution used by the fusion node detector and the distribution that generates the 
observations. Our goal is to design a sensor selection that yields detection (2) robust to these mismatches. 
In this paper, we restrict our attention to the case where only the mean values are uncertain (the true 
covariance matrices are known), and we allow the mean values to drift in the following ellipsoidal 
uncertainty regions: 

mi S £ (rhi,ki S' 1 ) i = 0, 1. (3) 

Here denotes the estimated mean vector, i = 0, 1, £(a, A) (A is a positive definite matrix) denotes 
the ellipsoid 

£ (a, A) = {x G R n : (x - a) T A(x - a) < 1}. 

and the parameter hi £ (0, +oo] is a free parameter which controls the "size" of the uncertainty region, 
e.g, if ki = +oo, there is no uncertainty: m 8 = mj. The orientations of the uncertainty ellipsoids in (3) 
are induced by the covariance matrices Sq and Si. This choice of the form of uncertainty regions is 
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motivated by the following fact: if the means are estimated via the sample mean estimator based on 
N i.i.d. observations (the minimum variance unbiased estimate for Gaussian distributions), then the 
covariance of the estimate fhi equals jjSi, for i = 0, 1. A standard measure of the uncertainty, the 
confidence region ([2]), is for mi given by (3). The scaling constants ko and k\ are, in this context, 
proportional to N; also, if N is known, ko and k\ can be used to design the uncertainty regions of 
desired confidence levels. 

We address the uncertainties in the mean vectors adopting the worst case approach. That is, we search 
for the sensor selection that gives the maximal distance (KL and C) for the worst case of the mean 
parameter drift. 

Optimization problems. We introduce the following two functions, /kl and fc, that capture the depen- 
dence of the KL and the C distance on the selection matrix E and the mean vectors mo and m\. 

/ KL (£,mo,™i) := D KL (M(E T m u E T S 1 E)\\M(E T mo,E T S E^j 
max ae[0> i]/c(s,£,mo,mi) := D c (N{E T m u E T S 1 E), Af{E T mo, E T S E)^ . 

The robust sensor selection optimization problems are then given as follows: 
maximize min , _i\ ^ c /~ , c-i\fKi<(E,mn,mi) 

subject to Eij e {0, 1} (4) 
E T E = I p 

maximize ™m moee{fhoM s^), mi eS{fh u k, s^ 1 ) max se[o,i]/c(s, E, m , mi) 
subject to E^ G {0, 1} • (5) 

E T E = I p 

It can be shown that (tr(-) and |-| denote the trace and the determinant, respectively): 

fK L {E,m ,m 1 ) = X - | (mi - m ) T E (e t S E^ ^ E T {m x - m ) + tr ^E T S Q E^j ^ E T S±E 

\E T SiE\ 1 

fc(s,E,m ,m 1 ) = ^ js(l - s)(mi - m ) T E (sE T S E + (1 - s)E T * E T (mi - m ) 

\E T S E\ S \E t SiE\ 1-5 1 
" l0g \sE T S E + (1 - s)E T S 1 E\ J ' (7) 

Optimization problems (4) and (5) are combinatorial. When n and p are small, a simple method for 
solving them is exhaustive search that checks all (™) sensor subsets (i.e. selection matrices). For large n 
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and p, however, this method becomes computationally infeasible. Indeed, we have the following result, 
which we prove in the Appendix A: 

Theorem 1 The optimization problems (4) and (5) are NP hard (even when ko = k\ = +oo and S\ = So). 

In principle, besides exhaustive search, problems (4) and (5) can be (inefficiently) solved by branch and 
bound methods, see, e.g., [6]. However, complexity of these methods relies strongly on the choice of 
bounds on the cost function; finding tight bounds is a hard problem itself. An interesting method for 
solving a problem somewhat similar to ours is proposed in [23]. In [23], the authors address the problem 
of finding the most informative locations, for future sensor placements, in a discretized Gaussian field; the 
measure of informativeness that the authors propose is the mutual information between sensed and un- 
sensed locations. The authors show that the mutual information is a submodular function, which assures 
that simple strategy of choosing sensor by sensor (greedy) gives a solution within 1 — - of the optimum. 
In our problems, however, such a bound on the greedy strategy (of selecting sensors one by one) does 
not hold, as our cost functions (KL and C distance) are not submodular (see the proof in Appendix B). 
In fact, in our problems, greedy performs poorly in many cases. The reason for this lies in the fact that 
the correlations-an information that greedy discards can play an important role in discerning between 
the two hypothesis. 



In this section, we present our algorithms, R-KL (robust KL based selection) and R-C (robust C- 
based selection), that, respectively, solve the problems (4) and (5). First, we explain the methodology 
and the structure of the algorithms, and we explain the geometrical intuition behind this methodology. 
Subsections III-A and III-B detail the R-KL and R-C algorithms. 

Algorithms methodology and structure. Geometrically, one combination of p sensors defines one p- 
dimensional subspace of W 1 , spanned by a set of canonical basis vectors corresponding to the chosen 
sensors. We call this subspace a canonical subspace. The cost functions (6) and (7) depend on E only 
through its range, and problems (4) and (5), in this sense, search for the best subspace, among Q) 




canonical subspaces, on which the original distributions should be projected. We relax these combinatorial 
problems by allowing for projections to arbitrary p-dimensional subspaces. Mathematically, this translates 
into replacing the set of 0/1 selection matrices with the set of n x p Stiefel matrices (that represent all 
p-dimensional subspaces). Then, we use a solution of the relaxed problem and "round" it by the closest 
canonical subspace. 

We call the first phase of our algorithm, that solves the relaxed Stiefel problem, the Relaxation phase; 
the second phase, in which we find the closest canonical subspace, is the Projection phase. Finally, 



III. Sensor selection algorithms 
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the last, third step in our algorithms, the Refinement phase, refines the solution by performing local 
optimization. 

A. Algorithm for the robust Kullback-Leibler based selection: the R—KL algorithm 
1) Relaxation phase: We solve the following Stiefel relaxation of the problem (4): 

maximize min^^^ s^),m 1 e£(fh 1 ,k 1 Sr 1 ) ^ KL ^' m °' mi ) (g) 
subject to E T E = I p . 

The relaxed problem (8) is nonconvex and still difficult to solve. We globally solve this problem for the 
case p = 1, i.e., we find the best 1-dimensional projection. General, p > 1 case, is addressed by a greedy 
approach, using our 1 — D tool. We first detail the algorithm that solves p = 1 case. 

(1.1) Case p = 1: global solution For p = 1, the constraint set of Stiefel matrices reduces to a sphere 
in R n and the problem (8) takes a simplified form: 

maximize min cF( ~ , \ i ^ T ^~^)f + e^e _ { e^e _ A 

m &£(m ,koS J, 2 | e 1 S e e ' S e to e 1 S e J 

m 1 e£(fh u k 1 Sr 1 ) • (9) 

subject to e T e = 1 

Our major contribution is showing that the problem (9) reduces to a search over a compact (one- 
dimensional) interval. We achieve this by a series of judicious problem reformulations, and by invoking 
convexity of quadratic mappings, [24], [25]. It is important to note that the original problem (9) has in 
general very high dimensionality (equal to the total number of sensor in the network n); also, due to its 
nonconvexity, it is very difficult to solve globally. By doing reformulations, we manage to map it to a 
tractable, scalar problem. The next Lemma, proved in [4], states the first step towards this goal. It shows 
that a solution of (9) can be reconstructed after solving a 2 dimensional problem (10). 



Lemma 2 Suppose (x*, y*) solves 

maximize ^kl(^,J/) 
subject to (x, y) G 1Z 



(10) 



where 



^>Kh(x,y) = x-\ogx + i(vv~ | ' ^ 

11 = |(x,y) G R 2 : x = v T Sv, y = v T Mv, for some v £ R n , v T v = lj , (12) 



+ = max(0, x), S = S 1 ^ 2 S'iS' 1 ^ 2 , m = S^ 1 ^ 2 (mi — mo) , M = mm T . Let v* G R n be an unit-norm 
that 
solves (9). 



vector that generates x* and y* , i.e. x* = v* J Sv* and y* = v* T Mv*. Then, e* := S 1 ^ 2 v*/\\S l ^ 2 v* 
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Lemma 2 says that, in order to solve (9), it suffices to search over the set Kcl 2 . For n > 3, the set 
1Z is compact and convex, as the image set of a unit sphere under two quadratic mappings, see [24]. 
Note that, since V'kl(-) is continuous and 1Z is compact, there is a global maximizer, by the Weierstrass 
theorem. Next lemma further simplifies the search by asserting that the boundary of 1Z contains a global 
maximizer. For the proof of Lemma 3, see [4]. 

Lemma 3 The boundary dlZ of the set 1Z contains a global maximizer of (10). 

The boundary of 1Z is a closed curve in IR 2 . Our strategy consists in circulating along dR to spot a global 
maximizer. More precisely, we will sample dR with a finite set of points and pick the best point. To 
implement this strategy, we borrow the following theorem from [25]. 

Theorem 4 ([25]) Let n > 3 and let A, B be n x n symmetric matrices. Let 

K(A,B) = G R 2 : x = v T Av, y = v T Bv, for some v G R n , v T v = lj . 

For t G [0, 27r], let C(t) = A cost + Bs'mt and let A m i n (t) be the minimal eigenvalue of the matrix 
C(t) and u m - m (t) an associated unit-norm eigenvector. Suppose that X m - m (t) is a simple eigenvalue of 

C(t) for all t G [0, 27r]. Then, the set 1Z(A, B) is strictly convex and its boundary is given by 

dTZ(A,B) = {(x(t),y(t)) : t G [0, 27r], x(t) = u min (t) J Au min (t) , y(t) = u mill (t) T Bu 

min 

Parametrization of dlZ. Theorem 4 shows that the set TZ(A, B) (in our context, the set 1Z = 1Z(S, mm T )) 
can be parameterized by moving a single parameter t over the compact interval [0, 27r]. The theorem 
assumes that A m i n (i) is simple for all t, and, consequently, that 1Z(A,B) is strictly convex. However, a 
parametrization of the boundary when this condition is not satisfied is readily available, as we explain 
next. Applied to our set 1Z in (12), this leads to the following procedure: 

1) generate the points 

{xk, Vk) = {u~lSu k , u[mm T u k ), k = l,2,...,K, 
where u k denotes an unit-norm eigenvector corresponding to the minimal eigenvalue of 

C k = S cos ((k - 1)2tt/K) + mm T sin ((k - 1)2tt/K) . 

Here, K is the user-defined grid size and {(xk, yk) '■ k = 1, . . . , K} is an initial sample of dlZ; 

2) if the distance between two consecutive points (xk, yk) and (xk+i, yk+i) is greater than a prescribed 
threshold, interpolate the line segment which connects them, i.e., consider 

= ( X ~ j/ J ) ( x k,Vk) + j/J(x k+1 ,y k+1 ) , j = 0,1,..., J. 
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In summary, our sampling of dlZ is dlZ = {(xk,yk)} U ■ 

Solving (9). We explained how to solve (10) by parameterizing dlZ. Now we explain how a solution 
of (10), x*,y*, is sufficient to reconstruct v*, a solution of (9). Vector v* in Lemma 2 can be found as 
follows. Let 

(x*,y*) £ arg max_ ipKh(x,y). 

(x,y)e&R 

That is, (x*,y*) denotes the best point in dlZ. If (x*,y*) € {(xk,yk)}, say (x*,y*) = (xk*,yk*), then 
we can take v* as an unit-norm eigenvector associated with the minimal eigenvalue of Ck*. Otherwise, 
(x*,y*) G \ 2/i^)|> an d we nee d to solve the system of 3 quadratic equations: 

v Sv = x*, v T mm T v = y*, v T v = 1, (13) 

with respect to (w.r.t.) v. Any solution can be taken as v*. It can be shown that (13) can be efficiently 
solved by solving a convex problem. 

1.2) Case p > 1: greedy algorithm Optimization problem (8) for the case p > 1 is very difficult to 

solve globally; we propose a greedy, suboptimal approach. We construct the columns of the matrix E = 

[ei e2 . . . e p ] one by one (in the order e±, ei, •••)• We construct the j-th column by solving (9), with the 

constraint that the column ej must be orthogonal to the previously determined columns e\, ei, ... 

i.e., we solve: 

maximize min^^-^-i^ m ^ e{ ^ MS -^ fKh{e, mo, mi, So, S±) 
subject to e T e = 1 (14) 
e T e; = 0, i = l,...,j -1. 

Let £ flpxfa-j+i) be a matrix with orthonormal columns which spans the orthogonal complement 
of span {ei, . . . , ej_i}. The restrictions in (14) mean that e = U^e^ for some unit-norm G K n-J ' +1 . 
This means that (14) corresponds to 

maximize mm mo6£( ~ )fco5 -i ); mie£(ffGifclSl -i) faU^h^, m , mi, 5 , Si) 
subject to e^) = 1. 

The problem (15) is equivalent to (16) (see [4]) 

maximize ".in , , rfjn _ A ( , u J ~, /kl^, m { j) , , S^ , s[ j) ) 

(16) 



subject to e^ = 1 



where m (i) = U^m and S* (i) = Sill®, i = 0,1. That is, (16) is simply an instance of (9) 
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in the reduced dimensional space R n for which we have developed a global solution. Algorithm 1 
outlines the overall approach. 

Algorithm 1: Greedy algorithm 
l: for j = 1 to p do 

2: Compute G K nx (™ _ .?+i) (U^ := /„), an orthonormal basis for the orthogonal complement 

of the j — 1 dimensional subspace spanjei, e2, ■ ■ ■ , ej-i} 
3: Compute the projected means and covariances m\ = U^ T mi, Sl j) = U^ T SiU^ for i = 0, 1 
4: Compute = (S^y^S? (S ^)- 1/2 , rn® = (S ij) )~ 1/2 (m? - mf), = m^m^ T 



Solve (10) for (M, S) := (M^,S^); find e& € R n -i +1 as: := e*. as in Lemma 2 
Compute the jth column of E as ej = U^e^ 
end for 



2) Projection phase: Relaxation phase III-A1 produces a Stiefel matrix E. Now, we project the matrix 
E back to the set of 0/1 selection matrices. We remark that the objective function /kl( - ) in eqn. (6) 
depends on the matrix E only through its range space; that is, /kl (EQ) = /kl (E), for any Stiefel 
matrix E and for any orthogonal pxp matrix Q. Thus, we choose the selection matrix E with the range 
space closest to the range space of matrix E. It can be shown (see [3]) that E can be efficiently obtained 
as follows: if (ji,j2, ■ ■ ■ , j P ) denote the indices of the largest entries on the diagonal of EE T , then 
E = [hj 1 hj 2 . . . hj p ] where hj stands for the j-th column of the identity matrix I n . Thus, the Projection 
phase has very small computational cost. 

3) Refinement phase: Once the projection to the set of 0/1 selection matrices is done and the matrix 
E is obtained, we finalize our algorithm with a local maximization around E to get E* (see [14], [19] 
for very similar local searches.) Namely, for a given selection matrix E in the neighborhood of E, we 
find 

/KL, worst (£) := min mo 6£(mo,iko5 - 1 ),m 1 e£(m 1 ,fe 1 5r 1 )^ KL ( jE; ' mo ' mi )- 

The procedure has p steps. We start with the matrix E := E. In the first step, all columns of the current 
selection matrix E are fixed except the first one, which is viewed as an optimization variable. The first 
column is swept through all canonical vectors hj, j = l,...,n, different from the remaining p— 1 
columns of E. After all possible choices for the first column are tested, the column is frozen to the 
choice that gives the maximal /kl, worst- In the second step, this procedure is repeated for the second 
column, and so on, up to the p-th step; after the p-th step is done, we set E* := E. 
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We obtain the quantity /kl, worst (E) by first finding: 

min (mi — mo ) T E{E T S E) V(mi-mo), (17) 

m &£(fho,ko S 1 ), 
miG£(ini,fci S^ 1 ) 

and then adding the remaining terms of the function /kl(-) that do not depend on mi and mo (see eqn. 
(6))- 

Similarly as with the equivalence of (15) and (16), it can be shown that the minimum in eqn. (17) 
(n-dimensional problem) equals the following minimum (of corresponding p-dimensional problem): 

min ||m'i — m ||, (18) 

m'„<=£(Q T (E T SoE)- 1/2 E T m ,ko I p ),m{e£(Q T (E r S E)~ 1/2 : E T mi,fci A- 1 ) 

where A and Q are, respectively, the matrix of eigenvalues and the matrix of eigenvectors of 
(E T S E)- 1 / 2 E T SiE(E J SqE)- 1 / 2 , and || • || denotes Euclidean norm. The problem (18) is a (convex) 
quadratically constrained quadratic problem (QCQP), and it can be solved with complexity 0(p 3 ) (see, 
e.g., [26]). 

B. Algorithm for the robust Chernoff-based selection: the R-C algorithm 

In this subsection, we present the algorithm R-C, the Chernoff based sensor selection under the presence 
of uncertainties. As mentioned previously, we adopt the same methodology for solving both (5) and (4) 
and, consequently, the structure of R-C is the same as the one in R-KL. However, the problem (5) is 
more difficult than (4), due to the additional maximization over the parameter s. This will result in several 
specificities in R-C compared to R-KL. We present R-C by focusing on these specificities, phase by 
phase, whereas the overall structure remains the same as in R-KL. 

The main difference between R-KL and R-C is in the Relaxation phase in the case p = 1. As with 
the KL case, we solve the resulting Chernoff problem globally; we next explain a solution. 

1) Relaxation phase: case p = 1: global solution: Stiefel relaxation of (5) for p = 1 is given by: 



maximize min^^ s -^^ ie£ ^ lM Si -i ) max se[0) i ] / c (a, e, mo, mi) 
subject to e T e = 1, 

where 

, , , a(l-a) (e T (mi-m )) 2 1 ( e T S l e) s (e T 5ie) 1 - s 

/ c (s,e,m ,mi) = =>- — -log 



(19) 



2 se T S* e + (1 - s)e T 5ie 2 b se T S e + (1 - s)e T S ie ' 

The first reformulation of (19) that we make is the conversion of the minimax into maximin problem, as 
Lemma 5 explains. 
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Lemma 5 Problem (19) is equivalent to: 

maximize max^^min^^^ s -^ miee ^ lM s ^)fc( s , e, mo, mi) 
subject to s G [0, 1]. 

Proof: Function fa is convex w.r.t. mo and mi, and concave w.r.t. s, and the constraint sets are 
compact and convex. Thus, the equivalence follows by Sion's minimax theorem ([27]). ■ 
Next, we focus on the inner maximization in (20): 



maximize min^^-^ s -^ mieg ^ lM Si -i)/c(«, e, m , mi) 
subject to e T e = 1 

where s € [0, 1] is fixed. The following lemma is the counterpart of Lemma 2. 
Lemma 6 Suppose (x*, y*) solves 

maximize ipc{ s ,x,y) 
subject to (x, y) <ElZ 



(21) 



(22) 



where 



ipc(s,x,y) = 2 — — s + (i- s ) x — " ^(l-s)logx + ^log(s + (l-s)x), 

and s G [0, 1]. Let f* G M n be an unit norm vector that generates x* and y*, i.e., x* = v* T Sv* and 

y* = v* T mm T v*. Then, e* := S' _1 ^ 2 f*/||S' _1 ^ 2 f*|| solves (21). 

Proof: The proof is similar to the proof of Lemma 2 and is omitted. ■ 
By similar analysis as in subsection III-A1, it can be shown that (22) can be solved by parametrization of 
the boundary of 1Z, given in eqn. (12). Thus, for fixed s, the algorithm that solves (21) is the same as the 
one that solves (9), except that i()kl(x, y) is replaced by Y>c(s, x, y). Then, the function ij)c{s, x*(s),y*(s)) 
can be evaluated using this algorithm and problem (19) is solvable by, e.g., a grid search on the interval 
[0, 1]. 

The projection phase of R-C is the same as the projection phase of R-KL and the steps in the refinement 
phase of R-C are the same as the ones in the refinement phase of R-KL (with /kl{E, mo, mi) replaced 
by max s6 r i] fc(s, E, mo, mi)). Similarly as in III-A3, in the refinement phase, for a given selection 
matrix E, we have to find 

/c,worst(£) := ^moeeifhoMS^m^sifh.MSr 1 ) max / C (S, m , mi). 
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Applying again the minimax theorem, we first exchange the order of min and max in /c, worst; then, for 
fixed s, we find 

min (mi - m ) T E (sE T S E + (1 - s)E T SiE) 1 E T {m l -m ), (23) 

m e£ (fho,k So 1 ), ^ ' 

miE£(jhi,ki S^ 1 ) 

by solving the equivalent p-dimensional QCQP (24) (with equal minimum): 

min (m^ — m ) T (s/ p + (1 — s)A)~ 1 (m' 1 — m ). (24) 

m' a e£(Q T (E T S E)- 1 ' 2 E T m ,k I p ), 

m' l &£{Q T (E T SoE)- 1 ' 2 E T 'fh u ki A" 1 ) 

Finally, we compute /c j worst (E) by the bisection method on parameter s. 

C. Complexities of R-KL and R-C 

The complexity of both R-KL and R-C algorithm is 0(n 3 p + np 4 ), although the hidden constant in 
R-C is larger than the one in R-KL. The least computational effort is required for the projection phase 
2, which for both R-C and R-KL is 0(n 2 ) and is dominated by complexities of the other two phases. 
It can be shown that Phase 1 has complexity 0(n 3 p) and Phase 3 has complexity 0(np 4 ). 

IV. Sensor selection algorithms: No uncertainties case 

In this section, we address a special, yet important case, when there are no uncertainties in the mean 
vectors and the problems (4) and (5) simplify by dropping the inner minimizations. We first remark that 
algorithms R-KL and R-C can readily solve the simplified versions of (4) and (5). However, we derive 
in this section a more efficient algorithm. We exploit the structure of the problem and the knowledge 
of exact distribution parameters m«, i = 0, 1 (more specifically, their difference mi — mo) to reduce the 
computational load of the relaxation phase of R-KL and R-C algorithm, while keeping the second and 
the third phase the same 2 . The key to reducing the complexity of the relaxation phase is a simple, analytic 
solution of the relaxed, Stiefel problem, in the special case of equal mean values. We refer to the overall 
simplified algorithms as MD-KL (mean-difference based KL algorithm), and MD-C (mean-difference 
based C algorithm). 

A. Kullback-Leibler based selection without uncertainties: The me an- difference KL algorithm (MD-KL) 

In this subsection, we only explain the relaxation phase of MD-KL, as the other two phases are the 
same as in R-KL. We first consider a special case of equal mean values of the problem (8) (without 

2 We remark that the refinement phase of R-C and R-KL simplifies significantly in the no uncertainties case, as in this case 
computing /kl, worst (E) boils down to computing /kl(-E), i.e. there is no need to solve intermediate minimization problems 
(see (17) and (23)). 
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inner minimization), and we show that this problem has a simple analytic solution. Based on the solution 
for the equal means, we derive an algorithm that solves the general case. 

1) Relaxation phase: the case mo = mi: Consider the problem (8) when there is no uncertainty in 
the mean values (uncertainty ellipsoids shrink to a point, by letting k®, ki = oo, and inner minimization 
drops from the problem). We first remark that we can write (8) when fcj = oo, mi = mo, in the following 
form: 

maximize \ (tr(P T SP) - log \P T SP\ - p) 
subject to P T P = I p , 

where S = Sq 1 ^ 2 SiSq 1 ^ 2 . This equivalence can be shown by noting that the constraint E T E = I p in (8) 
can be replaced by E T SqE = I p , and by introducing the new variable P = sl /2 E. 

The objective function in (25) can be further simplified to YTi=\ <^kl {Xi (P T SP)), where 4>kl(x) = 
x — log x — 1 and Aj denotes the 2-th largest eigenvalue. Invoking Poincare separation theorem (see [28]), 
we show in the Appendix C that the solution P* of (25) is given by a set of p orfhonormal eigenvectors 
of S corresponding to the p eigenvalues that give the highest objective </>kl( - )- It turns out that we do 
not need to check all (™) eigenvector subsets, but only at most p + 1 of them. We now give the procedure 
in Algorithm 2 to choose the optimal subset of eigenvectors (see Appendix C for the proof). 

Algorithm 2: Procedure for solving (25) when m\ = mo 

1: Set <P* KL = 

2: for j = to p do 

3: X = (X 1 (S), Xj(S), X n -p+j+1 

4: Compute (j) = Ya=i 0kl(^) 
5: if (j) > 0fc L , then f = j, <^ L = 4> 

6: end for 

7: x* = (Ai(5), . . . , Xj*(S), A ra _ p+ j* + i 

P* is the set of eigenvectors of S corresponding to the eigenvalues of S given in x*. 



We give the intuition behind the solution of (25). Recall that the matrix E is chosen such that the 
projection of the covariance matrix So equals E T SoE = I p . Then, the projection of Si, E J S\E, 
equals P T SP. Thus, the further from point 1 are the eigenvalues of P T SP, the better are the projected 
distributions separated. The function </>kl(0 measures the distance from 1 in this sense. 

2) Relaxation phase: general case: The main idea behind the relaxation phase of MD-KL is as 
follows: set one column of the solution Stiefel matrix E in the direction of the vector m\ — mo, i.e., 
in the direction of the difference of the distribution means. The remaining p — 1 columns of E are then 
obtained in the following way: we project the distribution parameters m^, Si, i = 0, 1, to the orthogonal 
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complement of m\ — mo, and then solve an (p — 1 dimensional) instance of (8) when ki = oo, with the 
projected distribution parameters. This p — 1-dimensional instance of (8) is in fact the special one, with 
equal means, and is hence very efficiently solved by procedure given in Algorithm (2). The relaxation 
phase of MD-KL is summarized in Algorithm 3. We give the intuition behind the choice of m\ — mo as 
the direction of the first column of E: the Euclidian distance between the means E T mi and E T mo of 
the projected distributions is maximal possible (and equal to \\m\ — mo||), when one column of E lies 
in the direction of mi — mo- 



Algorithm 3: MD-KL algorithm: Relaxation phase 

1: Set ei := (mi — mo) /||mi — mo\\ 

2: Compute U £ ]R nx ( ra ~ 1 ), an orthonormal basis for the orthogonal complement of ei 
3: Compute S' = (U T S Uy 1/2 U T Si U (U T S U)~ 1/2 
4: Find P as in Algorithm 2 for S := S', and p := p — 1 

1/2 

5: [e 2 , e 3 , ...,e p ] := U S Q P, E := [ei, e 2 , e p \ 



B. Chernoff based selection without uncertainties: The mean-difference C algorithm (MD-C) 

1) Relaxation phase: the case mo = m\: The counterpart of problem (25) for the C criterion is the 
following: 

maximize max se[0il] - log \J^ x t%,' SP \ (2g) 
subject to P T P = I p 

The objective in (26) can be written as Yn=i {s, Aj [P T SP)), where <j>c(s, x) := log(s + (1 — s)x) — 
(1 — s) logx. Now, the problem (26) (mi = mo) can be solved by a method similar to the one in IV-A1. 
Namely, 1) by invoking Poincare separation theorem; 2) by using concavity of 4>c{-,x); and 3) by using 
unimodularity of (f>c(s, •), it can be shown that the solution to (26) (m\ = mo) can be obtained by the 
procedure given in Algorithm 4. 

General case of the relaxation phase of MD-C algorithm is the same as the one in MD-KL given in 3, 
except that, in step 4), it calls procedure given by Algorithm 4. 

C. Complexities of MD-KL and MD-C 

We briefly comment on the complexity of MD-KL and MD-C. The complexity of both MD-KL and 
MD-C is 0(n 3 + np 3 ), although the hidden constant in MD-C is larger than the one in MD-C. The main 
computational burden in the relaxation phase of MD-KL is computation of the orthogonal complement U 
of mi — mo, and subsequent eigenvalue decomposition of the supplementary matrix S (see Algorithm 3), 
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Algorithm 4: Procedure for solving (26) when mi = mo 
1: Set 4>* c = 
2: for j = to p do 

3: x = (Ai(5), . . . , Aj(S'), A n _ p+j+ i(S'), . . . , A n (5')) 

4: Find s{x) £ arg max sg r 01 i X)i=i 0c(S)^i) by Newton method 

5: Compute ^ = £^ =1 c (s(x),Xj) 

6: if > then j* = j, s* = s{x), 4>c = 4> 

7: end for 

8: x* = (Ai(S-), . . . , A,, (5), A n _ p+J -. +1 (S), . . . , A„(S)); = s*; 

P* is the set of orthonormal eigenvectors of S corresponding to the eigenvalues of S given in x*. 



which is in total of order 0(n 3 ). The relaxation phase of MD-C requires more computational effort; 
besides finding U and the eigenvalue decomposition of S, each of the p — 1 steps in Algorithm 4 requires 
finding optimal s by Newton method (contrary to just evaluating the KL distance for a given choice of 
p — 1 eigenvalues of S with Algorithm 2). However, the number of operations in p — 1 Newton runs is 
still dominated by the number of operations to find U. Therefore, the complexity of the relaxation phase 
of MD-C is 0(n 3 ). Finally, it can be shown that, for both MD-KL and MD-C, the refinement phase is 
of complexity 0(np 3 ). 

V. Numerical studies: Testing the optimization criteria 

This subsection tests how good are the Kullback-Leibler and the Chernoff distance as optimality criteria 
for sensor selection. To this end, we want to compare the sensor selections that optimize the KL and C 
distances with: 1) the sensor selection that minimizes the Bayes probability of error (Bayes optimality); 
2) the sensor selection that minimizes the probability of miss subject to a given probability of false alarm 
(Neyman-Pearson optimality). We find numerically the Bayes optimal and the Neyman-Pearson optimal 
sensor selections by Monte Carlo simulations. With respect to Bayes optimality, for each possible (out 
of (p) sensor selection, we estimate the Bayes probability of error P c by Monte Carlo simulations with 
100, 000 instantiations of the maximum-likelihood detector tests (with zero treshold), with equal prior 
probabilities. With respect to Neyman-Pearson optimality, for the fixed probability of false alarm PpA, we 
find the sensor selection that maximizes the probability of detection Pq- This is achieved by estimating 
the receiver operating characteristic (ROC) curve (by Monte Carlo simulations with 20, 000 instantiations 
of the likelihood ratio tests) in the neighborhood of PpA in a fine grid, and then interpolating the ROC 
curve (i.e., Pq) at the desired point PpA- This is done for each possible selection and the selection with 
maximal obtained Pq is set as Neyman-Pearson optimal. 
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Table I shows the Bayes probability of error for: 1) the sensor selection that maximizes KL distance 
(KL); 2) the sensor selection that maximizes C distance (C); 3) the best sensor selection (that minimizes 
P e ); 4) the worst sensor selection (that maximizes P e ); and 5) the average P c over all selections. Table I 
(left) is for n = 12 and p = 2, 3, 4, 5 and Table I (right) is for n = 15 and p = 2,3, 4, 6. We can see 
that C selection matches the best selection. KL selection is in many cases very close or equal to C and 
best selections in P e . As could be predicted by theory (Chernoff lemma), the C selection is better than 
the KL selection in terms of P c . 

Comparison of: 1) C, KL and best selections; with 2) worst and average selections justifies the sensor 
selection problem; namely, by finding the optimal selection, P c can be for an order of magnitude smaller 
than for the average selection. (See Table 1 and Figure 1-right.) 

Table II shows the probability of detection for: 1) the KL-optimal selection (KL); 2) the C-optimal 
selection (C); and 3) the Neyman-Pearson (NP) optimal selection, for n = 12, p = 2, 3, 4, 5. We can see 
that, for p = 2 and p = 3, both KL and C selections match the NP optimal selections, and for p = 3 and 
p = 4, Pd for KL and C selections is at most 3.5% from the optimum (p = 4, Pd = 0.05). 

Figure 1 (left) plots the ROC curves for all possible selections, for n = 5 and p = 2. We plot the ROC 
curves for: 1) KL-optimal selection; 2) C-optimal selection; 3) the pointwise envelope of all possible 
curves (Neyman-Pearson optimal). Remark that the Neyman-Pearson optimal curve (envelope) is not 
obtained for a single selection; in different regions, it corresponds to different selections. We can see that 
for lower values of Pfa, KL selection is optimal; for higher values of Pd, C-selection is optimal. 

Figure 1 plots the ROC curve for all possible (™) = 1365 selections for a larger example, with n = 15 
and p = 4. Interestingly, we can see that the KL and C selections are very close to the optimum, in 
whole range of Pfa- In addition, we plot the average of the ROC curves (pointwise average of Pd for 
each fixed Pfa)- This average curve thus represents what performance would be, on average, achieved, 
if we choose a subset of sensors uniformly at random. We can see that there is a a large gain of the 
C and KL selections over this average curve; thus, selecting the optimal, rather than random subset of 
sensors, provides large performance gain. 

Finally, we remark that, in extensive simulations, we observe similar behavior as in representative 
Tables I and II, and Figure 1 (left and right). That is, the KL and C selections are very close to optimal 
and even equal to optimal in certain range of Pfa- We also report that C-selection is generally better 
than KL for large Pq's (upper right part of the ROC curve,) while KL is generally better for low Pfa 
(lower left in the ROC). Improvement of KL over C for low Pfa is smaller than the improvement of C 
over KL for large Pd- 
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Fig. 1. Left: n = 5, p — 2; ROC curves for all possible sensor selections (gray); the ROC curves for the KL-optimal and 
the C-optimal sensor selections, as well as the Neyman-Pearson optimal "envelope" are represented in different color. Right: 
n = 15, p = 4, Figure plots the same ROC curves as on the left; in addition, the Figure plots the average ROC among all 
sensor selections. 

TABLE I 

P e FOR KL, C AND OPTIMAL SELECTION; LEFT: n = 12, p = 2, 3, 4, 5; RIGHT: n = 15, p = 2, 3, 4, 6 





p = 2 


p = 3 


p = 4 


p = 5 




p = 2 


p = 3 


p = 4 


p = 6 


KL 


0.100 


0.078 


0.052 


0.046 


KL 


0.086 


0.061 


0.037 


0.022 


C 


0.100 


0.078 


0.052 


0.043 


C 


0.085 


0.051 


0.029 


0.012 


Bayes-best 


0.100 


0.078 


0.052 


0.043 


Bayes-best 


0.085 


0.051 


0.029 


0.012 


worst 


0.457 


0.439 


0.396 


0.319 


worst 


0.480 


0.440 


0.396 


0.311 


average 


0.275 


0.216 


0.170 


0.134 


average 


0.240 


0.180 


0.136 


0.077 



VI. Numerical studies: Testing the algorithms 

This subsection tests our algorithms for solving the sensor selection problem. Subsection VI-A shows 
that algorithms R-KL and R-C show near optimal performance in the presence of uncertainties (i.e., 
for solving the problems (4) and (5)), Subsection (VI-B) shows that R-KL and R-C have near optimal 

TABLE H 

Pd for Pfa = [0.005 0.03 0.1], n = 15, p = 2, 3, 4, 5 for KL, C and optimal selection 



P fa = 0.005 P FA = 0.03 P FA = 0.1 





Pd 


KL 


C 


NP 


KL 


C 


NP 


KL 


C 


NP 


p 


= 2 


0.720 


0.720 


0.720 


0.889 


0.889 


0.889 


0.961 


0.961 


0.961 


p 


= 3 


0.812 


0.812 


0.812 


0.941 


0.941 


0.941 


0.985 


0.985 


0.985 


p 


= 4 


0.868 


0.838 


0.868 


0.962 


0.967 


0.970 


0.990 


0.995 


0.995 


p 


= 5 


0.891 


0.898 


0.903 


0.968 


0.981 


0.983 


0.991 


0.994 


0.996 
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performance when also applied to the case of no uncertainties (i.e., when ko = k\ = oo in (4) and (5)). 
This subsection also shows that MD-KL (resp. MD-C) has comparable performance to R-KL (resp. 
R-C), while reducing the computational time. 

For smaller problem instances, i.e., for smaller values of n and p, we compare the solutions produced by 
our algorithms with the optimum obtained by exhaustive search. For larger examples, when the optimum 
is infeasible to compute, we generate randomly a number of sensor selections; then, we compare the 
selection obtained by our algorithms with the best among generated random selections. 

For a fixed size of the problem (for fixed pair (n,p)), we generate randomly K expeT instances of the 
problem parameters (mi, Si, i = 0, 1); for the case with uncertainties, K cxpci = 50, and for the case 
without uncertainties, K expcr = 200. In the case with uncertainties, parameters ko and k\ are chosen 
such that the norm of the mean vector drift does not exceed 0.15||mi — mo||, i.e., we set them as 
ki = \Si\ I (0.15||mi —mo||) 2 , i = 0, 1. Based on i^exper solved problems of fixed size (n,p), we 
create statistics on how our algorithms behave for given size of the problem. 

A. Testing the R-KL and R-C algorithms: The case with uncertainties 

Testing the robustness against the uncertainty in distribution means. Table III (left) shows perfor- 
mance of the R-KL and R-C algorithms, for solving (4) and (5), respectively. We evaluate the optimal 
solutions y OPT - KL an d yOPT-c ^ exhaustive search. We then compute the ratio tr_kl := jopt-kl 

i-R-C 

(resp. tr_c := jopt-c ) that says how close is the solution value obtained by R-KL (resp. R-C) to 
optimum. Table III (left) shows the results for n = 10, 12, 15 and p = 3. We present the maximum 
(max), the average (avg), the minimum (min), and the standard deviation (dev) of the ratio tr_kl over 
-fQxper = 50 problem instances, for each pair (n,p). We can see that both R-KL and R-C show very 
good performance; the R-C shows better performance than R-KL. The maximal value of both tr_kl 
and tr-c in ai l the examples is equal to 1. With the R-KL algorithm, the average value of tr_kl is 
always above 91.8%, and the minimum value is always above 51%; with the R-C algorithm, the average 
of tr_c is always above 96%, and the minimum is always above 59%. 

Table III (right) shows performance of R-KL and R-C for a larger example n = 50, p = 5, for which it 
was infeasible to find the optimal solution by exhaustive search. We thus randomly generate 2500 sensor 
selections, and we evaluate the quantity y BE ST-RAND-KL_ tne ma ximal objective function in (4) over all 

fE-KL 

2500 randomly generated selections. Define the ratios pr_kl := jbest-rand-kl ■ (Analogously define the 
corresponding ratios for the Chernoff43ased sensor selection.) We report that, to obtain j be st-RAND-kl 
(for 2500 50 x 5 selection matrices) it takes about 10 times longer than for R-KL algorithm to find a 
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TABLE III 

Left: Statistics for r R _ K L and r R ,_c, smaller examples: n = 10, 12, 15, p = 3, ifexper 



50; Right: Statistics 



for pr-kl and pb.-c, n = 50, p = 5, JC e: 



50 





max 


avg 


min 


dev 












m-KL 




















n = 10 
n = 12 


1.000 
1.000 


0.964 
0.918 


0.606 
0.551 


0.090 
0.140 




max 


avg 


min 


dev 












n = 15 


1.000 


0.939 


0.512 


0.118 


PR-KL 


1.777 


1.267 


0.817 


0.191 


m-c 










PR-C 


1.868 


1.277 


1.005 


0.166 


n = 10 


1.000 


0.981 


0.7862 


0.051 












n = 12 


1.000 


0.982 


0.834 


0.037 












n= 15 


1.000 


0.961 


0.595 


0.095 













solution. Also, to obtain j be st-RAND-c j t ta ^ es aD out 6 times longer than for R-C algorithm to find a 
solution. From Table III (right) we can see that R-KL in many cases outperforms random strategy: with 
significant savings in time (90%) it gives a 26% larger objective function on average. R-C always stays 
above the random strategy in terms of the objective function, with computational savings of 83%. 

B. Testing the algorithms: The case without uncertainties 

Smaller examples: Comparison with the global optimum. Table IV shows performance of the R-KL 
and MD-KL algorithms for solving (4) in the case without uncertainties (ko = k\ = oo); table V shows 
performance of the R-C and MD-C algorithms for solving the corresponding Chernoff problem (4). 
For fixed n and p, we generate K exper = 200 sets of problem parameters mi,Si,i = 0,1; for each 
fixed rrii,Si,i = 0,1, we evaluate the ratios tr-kl, Hvld-kl, ?"r-C> an d ?*md-c- Table IV shows the 
maximum (max), the average (avg) and the minimum (min) of the quantities tr_kl and tmd-kl over 
all K cxpci = 200 experiments. The standard deviation in these experiments varies from 0.03 to 0.06. We 
can see from the table that the maximum of both tr_kl and tmd-kl is always 1. Table IV demonstrates 
very good performance of both R-KL and MD-KL algorithms. On average, tr_kl and tmd-kl are 
always (for each pair n,p) above 97.7%; the minimum is always above 67.2%. We see that R-KL and 
MD-KL algorithms have comparable performance with respect to (near)optimality, while MD-KL has 
smaller computational cost (We note, however, that MD-KL does not apply for the case with uncertainties 
in distribution means.) 

Table V shows very good performance of the R-C algorithm and the MD-C algorithm. As with the 
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1.000 
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0.789 


1 000 


0.985 


0.688 


1 000 
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II = 


30 


1.000 


0.989 


0.830 


1.000 


0.988 


0.826 


1.000 


0.983 


0.795 


n = 


40 


1.000 


0.989 


0.729 


1.000 


0.985 


0.842 


1.000 


0.983 


0.817 


TMD- 


KL 




















n = 


20 


1.000 


0.992 


0.744 


1.000 


0.982 


0.688 


1.000 


0.975 


0.672 


n = 


30 


1.000 


0.989 


0.809 


1.000 


0.987 


0.832 


1.000 


0.981 


0.742 


n = 


40 


1.000 


0.985 


0.729 


1.000 


0.980 


0.802 


1.000 


0.981 


0.834 



TABLE V 

STATISTICS FOR r R _ c = / R - C / '/OPT-C AND ^ ^ = jMD-C/y 
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n = 


20 


1.000 


0.994 


0.831 


1.000 


0.992 


0.789 


1.000 


0.995 


0.850 


n = 


30 


1.000 


0.997 


0.880 


1.000 


0.997 


0.868 


1.000 


0.995 


0.883 


n = 


40 


1.000 


0.998 


0.959 


1.000 


0.995 


0.936 


1.000 


0.997 


0.959 
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n = 


20 


1.000 


0.997 


0.835 


1.000 


0.995 


0.874 


1.000 


0.996 


0.918 


n = 


30 


1.000 


0.995 


0.874 


1.000 


0.997 


0.892 


1.000 


0.995 


0.928 


n = 


40 


1.000 


0.998 


0.931 


1.000 


0.994 


0.933 


1.000 


0.994 


0.953 



case of KL algorithms, the maximum is always 1. The average of both quantities tr_c and tmd-c is 
always above 99.2%, and the minimum is always above 78.9%. Standard deviation is smaller than the 
one in R-KL and varies between 0.01 and 0.026. Thus, R-C and MD-C show closer near optimality 
than R-KL and MD-KL. 

Larger examples: Comparison with the best randomly generated selection. We now consider larger n 
and p, when computing the optimum by the exhaustive search is infeasible. Similarly as in subsection VI-A 
(larger examples), we randomly generate 10 5 sensor selections and find the maximal objective function 
yBEST— RAND— kl oyer ^ggg selections. For a fixed pair n and p, we generate K cxpcr = 100 sets of the 
distribution parameters m^, Si, i = 0, 1; for each set of the distribution parameters, we evaluate /9r_kl, 
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TABLE VI 

pR-KL / j?BEST— RAND— KL _ _ (-MD-KL / /BEST — RAND— KL 



STATISTICS FOR p R _ K L = f«-^/f^ -hand-^ and pMD _ KL = f MU-^ /f t 
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= 0.1 x 


n 


P 


= 0.2 x 
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avg 


max 
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max 


min 
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PR 
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n = 


= 50 


1.069 


1.246 


0.855 


1.225 


1.577 


1.072 


1.274 


1.733 


1.071 


n = 


= 80 


1.298 


1.544 


1.162 


1.474 


1.860 


1.233 


1.472 


1.962 


1.229 


n = 


100 


1.427 


1.705 


1.169 


1.556 


1.739 


1.337 


1.598 


1.894 


1.378 


Pmd 


-KL 




















n = 


= 50 


1.072 


1.246 


0.919 


1.223 


1.464 


1.002 


1.265 


1.684 


1.077 


n = 


= 80 


1.299 


1.544 


1.130 


1.475 


1.782 


1.248 


1.468 


2.002 


1.225 


n = 


100 


1.429 


1.705 


1.169 


1.563 


1.864 


1.384 


1.617 


1.973 


1.374 



Pmd-kl, and /9r_c, Pmd-C- We are also interested in the ratio of the computational time of the R- 
KL (resp. MD-KL) algorithm over the computational time of checking 10 5 random selections (and the 
analogous quantities for the Chernoff-based selection). 

Table VI shows the average (avg), the maximal (max), and the minimal (min) values of pr_kl and 
Pmd-kl over 100 data sets nii, Si, i = 0, 1; Table VII (left) shows the computational time ratios. We 
can see that both R-KL and MD-KL outperform best random selection strategy, as they achieve larger 
objective function while reducing computational time. For example, for n = 80, p = 0.1 x n = 8, R-KL 
can give 54% larger objective function, while it has 3 times smaller computational time. MD-KL is better 
than R-KL, as it shows comparable performance in terms of the objective function; at the same time, it 
significantly reduces the computational time. Thus, for sensor selection without uncertainties, MD-KL 
is a very good tool that can handle large problem instances. Similar conclusions hold for the Chernoff 
based counterpart algorithms (see Table VII and Table VIII (right)). Chernoff based algorithms have 
larger computational time than Kullback-Leibler based algorithms, which is expected due to additional 
maximization over variable s (see (5)). Section V shows generally that Chernoff criterion has some 
advantage over Kullback-Leibler criterion, which trades off the computational requirements to solve for 
these criteria. 

VII. Conclusions 

In this paper, we addressed the problem of finding the most informative subset of p out of n sensors, 
for the task of deciding between the two possible hypothesis on the monitored environment. We proposed 
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TABLE VII 

STATISTICS FOR p R _ C = /R-C/jBEST-RAND-C AND pMD _ c = / MD-C // BEST-RAND-C 

p = 0.1 x n p = 0.2 x n p = 0.3 x n 

avg max min avg max min avg max min 

PR-C 

ra = 50 1.074 1.214 0.992 1.182 1.341 1.072 1.194 1.307 1.095 

n = 80 1.262 1.454 1.132 1.357 1.621 1.199 1.338 1.464 1.206 

n = 100 1.375 1.517 1.251 1.447 1.616 1.299 1.402 1.629 1.287 

Pmd-c 

ra = 50 1.074 1.214 0.991 1.182 1.341 1.038 1.195 1.307 1.100 

n = 80 1.262 1.471 1.132 1.357 1.621 1.194 1.338 1.464 1.212 

n = 100 1.375 1.517 1.254 1.445 1.616 1.296 1.403 1.632 1.296 

TABLE VIII 



p = 10% p = 20% p = 30% p = 10% p = 20% p = 30% 



R-KL R-C 

n = 50 0.070 0.104 0.107 n = 50 0.105 0.146 0.153 

n = 80 0.276 0.295 0.291 n = 80 0.472 0.549 0.507 

n = 100 0.417 0.472 0.349 n = 100 0.875 0.990 0.721 

MD-KL MD-C 

n = 50 0.003 0.005 0.006 n = 50 0.002 0.004 0.006 

n = 80 0.007 0.012 0.014 n = 80 0.006 0.011 0.014 

n = 100 0.011 0.017 0.021 n = 100 0.010 0.017 0.021 



two different information theoretic criteria for the best sensor selection: the Kullback-Leibler distance 
and the Chernoff distance between the distributions induced by selected sensors. We tackled the case 
where the distributions are Gaussian, but the mean vectors are known only up to confidence regions. 
We formulated the corresponding maxmin optimization problems, and developed the R-KL and R-C 
algorithms, that efficiently solve the problems with complexity 0(n 3 p + np A ). We also addressed the 
case when the mean vectors are known and, for this case, we exploited the structure of the problems 
to develop more efficient algorithms, MD-KL and MD-C, of complexity C(n 3 + np 3 ). We performed 
Monte-Carlo based experiments to test both the proposed sensor selection criteria and the sensor selection 
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algorithms. Numerical studies of the criteria show that the Kullback-Leibler based and the Chernoff based 
selections have near optimal performance, both in the Neyman-Pearson and Bayes sense. The performance 
of our algorithms we compared with 1) the optimal sensor selections, when the exhaustive search to 
compute them is feasible (smaller n and p); and 2) with best random selections, when n and p are large. 
Comparison with the exhaustive search shows that proposed algorithms in many cases find the optimal 
selection and, on average, are at most 5% below the optimal value; at the same time, computational savings 
are significant. For larger problems, simulation results demonstrate that our algorithm outperforms random 
searches, once an upper bound on computational time is set. 

Appendix 

A. Proof of NP hardness of optimization problems (4) and (5) 

We will prove that both problems (4) and (5) are NP hard by reducing them to the maximal clique 
problem (MQP), which is known to be NP hard, see, e.g., [29]. We first define the MQP Consider 
an undirected, simple (i.e., without self-loops) graph Q = (N,£), where M is the set of vertices with 
cardinality \J\f\ = n, and £ is the set of undirected edges {i, j}, \£\ = m. A clique of the graph Q, of 
size p, is a complete subgraph of Q that has p vertices. The decision version of the maximum clique 
problem is as follows: 

MQP: The maximal clique problem: "For given graph Q and a positive integer p, 1 < p < n, 
determine whether Q has a clique of size at least p" 

We conduct a reduction to MQP by attaching to a graph Q anxn matrix S (Q) of a special structure. 
Namely, we define a positive definite matrix S (Q) as follows: 

I In ; if i = j 
-1 ;ifi/j, {i,j}e£ ■ (27) 
; otherwise 

The matrix S {Q) is positive definite because it has positive diagonal elements and it is strictly diagonally 
dominant. Now, fix an integer p, 1 < p < n, and consider a set of matrices A p defined as A p = 
{A £ W xp : A = A T , An = 2n, Vi, A^ £ {0, -1}, i / j}. Clearly, all matrices in A p are positive 
definite, as they are strictly diagonally dominant, with positive diagonal entries. Denote with l p the column 
vector with all entries equal to 1 and define the function s.e.i. : A p — > R as s.e.i.(A) = lj A~ 1 1 P (s.e.i. 
is the sum of the elements in the inverse.) We have the following result on the matrices in A p . 

Lemma 7 For all matrices A G A p , there holds: 
1) Denote B := A -1 . Then, for all i,j Bij > 0. 
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2) If Aij = 0, then s.e.i. (^A — hihj — hjhj} > s.eA.(A). (Recall that hi denotes 2th canonical vector.) 

3) s.e.i. (A) < 2n ~p+i ' where the equality holds if and only if A = A* := 2nl - 11 T + 1. 

Proof: The claim 1) in Lemma 7 follows from the fact that the matrix A is an M matrix [30], and, 
consequently, the matrix B = A~ l has all entries greater than or equal to zero (e.g., [30]). We now show 
the claim 2) in Lemma 7. Remark first that the function s.e.i.(-) is convex and differentiable on the set 
of positive definite matrices. Applying the first order Taylor expansion lower bound at A, we get: 

s.e.i. [A - hihj - hjhj} > s.eA.(A) + tr (Vs.e.i.(v4) (-hihj - hjhj}} , (28) 

where Vs.e.L(A) stands for the (matrix form) gradient of s.e.i.(-) at A and is equal to —A~ l l p lJ A^ 1 . 
Now, by claim 1), all entries of A -1 are nonnegative, and, thus, the second term on the right hand side 
of the inequality (28) is nonnegative as well. This completes the proof of claim 2. 

We proceed and prove the claim 3) in Lemma 7. From claim 2) we know that the more — l's a matrix 
A G A p has on its off-diagonal entries, the higher the value s.e.i. (A) can be. Therefore, A* is a maximizer 
of s.e.i. over the set A p . Also, it is straightforward to check that s.e.i.(A*) = 2n - P +i • ^ e w ^ show next 
that A* is in fact the only maximizer of s.e.i. (over Ap). To show this, it suffices to show that, for any 
choice of 1 < i, j < p, i ^ j, the following strict inequality holds: 

s.e.i. (A* + hihj + hjhj^j < s.e.i.(A*). (29) 

To this end represent the matrix hihj + hjhj as hihj + hjhj = HCH T , where H = [hi hj] G R Ttx2 
and C G IR 2x2 , Cu = C\i = 1, C\\ = C22 = 0. Using the matrix inversion lemma, we get: 

s.e.i. (V + hihj + hjhj} = ljB"l p - 1 P B*H (c^ 1 + H t B*h} 1 H T B*l p . (30) 

After straightforward algebra, we obtain: 

1 P B*H ( C- 1 + H T B*H) H t B*1 p = 2n + 1 

Since this term is greater than zero for all 1 < p < n, the inequality (29) follows. This completes the 

proof of claim 3) and the proof of the Lemma. ■ 
We proceed with the proof of Theorem (1). The decision version of (4), for k\ = k$ = +00, is: 
D-KL: Decision version of (4) "For given data: 1) vectors mo, mi G R n ; 2) positive definite matrices 

So and Si; 3) positive integer p, p < n; and 4) a number /£ L , determine whether there is a n x p sensor 

selection matrix E, such that /kl(-E) defined in eqn. (6) is at least /kl" 

We now reduce the MQP to D-KL. Consider a simple, undirected graph Q and consider MQP of 
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determining whether there is a clique in Q of the size at least p. Define the matrix S{Q) as in eqn. (27). 
Consider an instance of D-KL, for some fixed p, with the following data: 1) m\ = l n , mo = n ; 2) 
S± = S = S(G); 3) p; and 4) /£ L = \ 2n p p+1 ■ Now, the answer to D-KL is YES (resp. NO) if and 
only if Q has (resp does not have) a clique of size at least p. Thus, MQP problem is reduced to D-KL. 

MQP problem reduces to the decision version of problem (5) (denoted by D-C), for k\ = ko = 
+oo, in a very similar way as it reduces to D-KL, by considering the instance of D-C with the data 
mi, m , Si, S ,p, same as we considered for D-KL, but /£ L = \ 2n _ p p+1 is replaced by /£ = \ 2n f p+1 ■ 

B. Proof that the Kullback-Leibler and Chernoff distances are not submodular functions 

See [23] for the definition of a submodular function. Consider two Gaussian distributions N(mi,Si), 
i = 0, 1, with parameters mo = m\, So = I 3 and S± = I 3 + efahj + h 3 hJ,) Q12, h 3 G M 3 ), where 
< e < 1. Let = hi G R 3 , E 13 = [h x h 3 ], E 12 = [/ii ^2] and E l23 = I 3 . 

Computing the Kullback Leibler distance for selections E\, E\ 2 and E\ 3 we get /kl(-Ei) = /kl(-E-12) = 
/kl(^13) = 0, whereas f K L(E 123 ) = -|log(l - e 2 ) > 0. Thus, / KL (£i 3 ) - /kl(#i) < /kl(^12 3 ) - 
/kl(-Si2) which proves that function /kl is not submodular. The proof for the case of Chernoff distance 
can be done in similar way. 

C. Solution to problem (25) 

As explained in subsection IV-A1, the cost function can be written as \ Yli=i ^kl (A« (P t SP)), 
where ^kl(^) = x — logx — 1. Now, invoking Poincare separation theorem, we can write (25) as: 

maximize \ Ya=\ ^kl(^) 

subject to Xi(S) <Xi< X n - p+i (S), i = 1, ...,p 

\j 1 ) 

X\ < X 2 < . . . < Xp 

3P e R nxp s.t .P T P = I p and Xi = Xi(P T SP) for i = 1, ...,p. 

Next, we relax the problem (31) in the sense that we do not require that variables Xi are generated via 
the Stiefel matrix P: 

maximize \ Yn=i ^kl(^) 

subject to Xi(S) < xi < X n - p+ i(S), i = 1, ...,p (32) 
x\ < X2 < • • • < x p . 

We now focus on problem (32). Function ^>kl(0 is convex, with minimum equal to zero attained 
at point 1. Thus, its maximum over a compact interval is always at a boundary point. Also, if both 
boundary points are greater (resp. smaller) than 1, then the maximizer is the right (resp. left) boundary 
point. Therefore, if Xi(S) > 1, then optimal Xi equals X n ^ p+ i(S) > 1. Similarly, if X n - p+ i(S) < 1, 
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optimal Xi equals Xi(S). If we rule out all eigenvalue intervals that fall in either of the two previous 
categories, we are left with intervals where 1 is in their interiors. We now focus on those eigenvalue 
intervals, i.e. on those i = l,...,p s.t. Aj(S') < 1 and X n - p+ i(S) > 1. It can be shown that there 
will exist a switching index Switch £ {0,1,..., p} such that for all indices on the left of (or equal 
to) Witch* optimal Xi is on the left boundary point, while for all indices on the right of Switch op- 
timal Xi is on the right boundary point. Summing up, a solution x* of (32) will be always of the 
form x* = (Ai(S), . . . , X iswitch (S), X n - p+iamtch+1 (S) , . . . , X n (S)) T . (Case j switc h = corresponds to 
x* = (Xn- P+ i(S), . . . , Xn(S)) T .) 

We now construct solution of (31) from x*-solution of (32); x* consists of a subset of eigenvalues of 
the matrix S; moreover, it is given by x* = (X 1 (S),..., A iswitch (S'), A n _ p +i switch +i(S'), . . . , X„(S)) T , for 
some i sw itch G {0, 1, ■■■,p} (that must be determined). Therefore, there exists a matrix P* that generates 
x*, and hence solves (31). The columns of the matrix P* are simply orthonormal eigenvectors of S 
corresponding to the eigenvalues Ai(5), . . . , A iswitch (S) , X n - p+isysitch+1 (S) , X n (S). 

We remark that the solution P* of (25) is obtained in at most p + 1 steps. The number of steps is 
smaller when some of the interlacing eigenvalue intervals do not contain 1. If the number of such intervals 
is k, then the number of steps that our algorithm requires is exactly p — k + 1. 
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